The Rationale for Building an Ontology Expressly for NLP
نویسندگان
چکیده
In this paper we argue for the need of NLP-specific resources to support truly high level, semantically oriented applications. We describe what, in our experience, constitutes useful knowledge for such applications and why most extant resources are not sufficient for this purpose, leading our Ontological Semantics group to build its own. We suggest that extensive time and energy are being spent on resources for NLP, though not on developing ones of higher utility but, rather, on trying to discover ways of using less than ideal ones. We believe that a more useful long-term approach to the problem of knowledge acquisition for NLP would be to acquire what is needed from the outset, since it is likely that in the end such work will prove necessary anyway. Introduction. A frequent question asked of our Ontological Semantics (OntoSem) group is, what available knowledge resources do you use? WordNet? FrameNet? XTAG?, etc. The question is valid: a number of research groups are building resources that are claimed to have if not primary then secondary applicability to natural language processing (NLP). So, if one were to assume that knowledge is knowledge – with the implication that any and all knowledge is valuable – then one would expect the developers of a knowledge-based system like OntoSem to voraciously incorporate everything available. We, however, do not do this because past attempts to incorporate resources that were not built explicitly to support semantic-rich text processing were less time efficient than starting from scratch; and, in a practical, application-oriented environment like OntoSem, the potential theoretical insights from experiments in resource merging become secondary to the practical necessity of building systems. Thus, we have been developing a suite of interconnected static resources and processors that are specifically targeted at high-end applications. In this paper we present a brief overview of OntoSem, describe why a number of the most widely reported resources are less applicable to NLP than is widely believed and hoped, and present the opinion that, as a field, we should develop resources that are truly sufficient for high-end NLP rather than spend the same significant amount of time and effort attempting to utilize resources borrowed from other fields or developed for other purposes, with inevitably inferior results. A Snapshot of Ontological Semantics. OntoSem is a text processing environment that takes as input unrestricted raw text and carries out its tokenization, morphological analysis, syntactic analysis, and semantic analysis to yield formal text-meaning representations (TMRs). Text analysis relies on: • the OntoSem language-independent ontology, which is represented using its own metalanguage and currently contains around 5,500 concepts, each described by an average of 16 properties (“features”), selected from the hundreds of properties defined in the ontology; the number of concepts is intentionally restricted, so that mappings from lexicons are many-to-one; • an OntoSem lexicon for each language processed, whose entries contain (among other information) syntactic and semantic zones (linked through special variables) as well as procedural-semantic attachments that we call “meaning procedures;” the semantic zone most frequently invokes ontological concepts, either directly or with modifications, but can also describe word meaning extraontologically, for example, in terms of parameterized values of modality, aspect, time, etc., or combinations thereof; • a fact repository, which contains real-world facts represented as numbered “remembered instances” of ontological concepts (e.g., SPEECH-ACT-3186 is the 3186 instantiation of the concept SPEECH-ACT in the world model constructed during text processing as the embodiment of text meaning); • the OntoSem text analyzers, covering everything from tokenization to TMR creation; • the TMR language, which is the metalanguage for representing text meaning, compatible with the metalanguage of the ontology and the fact repository. Details of this approach to text processing can be found, e.g., in Nirenburg and Raskin forthcoming and Nirenburg et al. 2003. The ontology itself, a brief ontology tutorial, and an extensive lexicon tutorial can be viewed at http://ilit.umbc.edu. TMRs represent, to our knowledge, the most semantically rich, automatically generated expressions of text meaning of any extant system. They require detailed lexical and world knowledge, most of which must be manually acquired. Many believe that manual knowledge acquisition is too expensive to be feasible, so they work on circumventing this problem: some groups attempt to maximize the use of noisy knowledge in NLP applications, e.g., Krymolowski and Roth (1998); and numerous groups attempt to adapt WordNet for use in NLP (especially with respect to problems of ambiguity): e.g., Mihalcea and Moldovan (2001) automatically generate a more coarse-
منابع مشابه
The Rationale for Building Resources Expressly for NLP
In this paper we argue for the need of NLP-specific resources to support truly high level, semantically oriented applications. We describe what, in our experience, constitutes useful knowledge for such applications and why most extant resources are not sufficient for this purpose, leading our Ontological Semantics group to build its own. We suggest that extensive time and energy are being spent...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملDeveloping a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملAn NLP-Based Ontology Population for Intentional Structure
This paper describes an automatic approach that populates an ontology of intentions from textual client request in IT market. This approach is based on an ontology structure that models clients’ intentions. It combines NLP (Natural Language Processing) techniques to populate the ontology by the client’s intention recognized from an English written request. Our automatic approach ensures the seg...
متن کاملEnhancing ITS building process with semi-automatic domain acquisition using ontologies and NLP techniques
In this paper, a proposal for automating the acquisition of the domain for the ITSs is presented. Natural Language Processing (NLP) techniques will be used to gather from books and electronic material the Domain Module Structure in a semi-automatic way and to build the ontology that describes it. Main domain topics and relationships among topics will be identified using NLP techniques. On the o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004